Cache write generate for parallel image processing on shared memory architectures

نویسندگان

Craig M. Wittenbrink

Arun K. Somani

Chung-Ho Chen

چکیده

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache Write Generate For High-Performance Processing

Much attention has been paid to read caching and several schemes have been developed to make read caching very e cient. As a result, the performance of write caching has become a concern. This paper investigates write caching policies and how they a ect the performance of memory systems. We show that write caching can greatly alter the hit/miss ratios, but only more subtly a ects the performanc...

متن کامل

Design of a Simulator for Large-Scale Distributed Shared-Memory Cache-Coherent Architectures

As the scale and the complexity of parallel computer systems grow rapidly, the study of interactions between application algorithms and parallel architectures becomes more important. Execution-driven simulation under realistic workloads proves to be an accurate and eecient technique for studying the performance of computer systems. However, direct-execution simulation of shared-memory cache-coh...

متن کامل

Parallel Conventional Systems versus Parallel Logic Programming Systems on Distributed Shared Memory Architectures

Distributed shared memory architectures have been object of research by many computer science groups. Research goes broadly from hardware based coherence protocols to DSM software protocols on networks of workstations passing through high technology interconnection networks that reduce network latency. In this work we thoroughly investigate how diierent hardware cache coherence protocols aaect ...

متن کامل

Memory Latency in Distributed Shared-Memory Multiprocessors

Analytical models were developed and simulations of memory latency were performed for Uniform Memory Access (UMA), Non-Uniform Memory Access (NUMA), Local-Remote-Global (LRG), and Replicated Concurrent-Read ( R C R ) architectures for hit rates from 0.1 to 0.9 in steps of 0.1, memory access times of 10 nsec to 100 nsec, proportions of read/write access from 0.01 to 0.1, and block sizes of 8 to ...

متن کامل

Parallel Processing Using the Silicon Graphics / Cray Origin 2000

The Origin 2000 is a high performance computing platform produced jointly by Silicon Graphics / Cray. This scalable shared memory processor (SSMP) may be configured with up to 128 processors in a single system image. The Origin is a scalable, cache coherent, non-uniform memory access (CC-NUMA), distributed shared memory (DSM) architecture based on a hypercube interconnection topology. Effective...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

دوره 5 7 شماره

صفحات -

تاریخ انتشار 1996

Cache write generate for parallel image processing on shared memory architectures

نویسندگان

چکیده

منابع مشابه

Cache Write Generate For High-Performance Processing

Design of a Simulator for Large-Scale Distributed Shared-Memory Cache-Coherent Architectures

Parallel Conventional Systems versus Parallel Logic Programming Systems on Distributed Shared Memory Architectures

Memory Latency in Distributed Shared-Memory Multiprocessors

Parallel Processing Using the Silicon Graphics / Cray Origin 2000

عنوان ژورنال:

اشتراک گذاری